perm filename CNTOUR[0,BGB]4 blob
sn#114000 filedate 1974-08-06 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00011 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 {⊂C<NαIMAGE CONTOURING.λ30P70I425,0JCFA} SECTION 7.
C00006 00003
C00011 00004 ⊂7.1 CRE - An Image Processing Sub-System.⊃
C00018 00005
C00021 00006 ⊂7.2 Thresholding.⊃
C00025 00007 ⊂7.3 Contouring.⊃
C00030 00008 ⊂7.4 Polygon Nesting.⊃
C00034 00009
C00037 00010 ⊂7.5 Contour Segmentation.⊃
C00041 00011 ⊂7.6 Related and Future Image Analysis.⊃
C00044 ENDMK
C⊗;
{⊂C;<N;αIMAGE CONTOURING.;λ30;P70;I425,0;JCFA} SECTION 7.
{JCFD} VIDEO IMAGE CONTOURING.
{λ10;W250;JAFA}
7.0 Introduction to Image Analysis.
7.1 CRE - An Image Processing System.
7.2 Thresholding.
7.3 Contouring.
7.4 Polygon Nesting.
7.5 Contour Segmentation.
7.6 Related and Future Image Analysis.
{λ30;W0;I950,0;JU;FA}
⊂7.0 Introduction to Image Analysis.⊃
Simply put, image analysis is the inverse of image synthesis.
From this point of view, the usually difficult question of "analysis
into what ?" is answered by the answer to the question "synthesis
from what ?". Since a 3-D geometric model is adequate (and necessary)
for synthesizing digital television pictures, it is reasonsable to
suppose that a 3-D geometric model is an appropriate subgoal in the
analysis of television pictures. Such an analysis into a 3-D model
would provide a useful data reduction as well as a convenient
representation for solving robotics problems such as manipulation,
navigation and recognition. This approach to image analysis is
somewhat heretical, the orthodox method being to extract freatures
from 2-D images which are used directly to perform the desired task.
On the other hand, vision by inverse computer graphics may be viewed
as an extreme form of feature finding, involving the coherent
extraction of a set of basic geometric features which are combined to
form a 3-D model, which is a super feature. The rest of this
introduction enumerates some of the kinds of information available in
a sequence of images and some of the kinds of data structures for
representing images. For a first definition, an image is a 2-D data
structure representing the contents of a rectangle from the pattern
of light formed by a thin lens; a sequence of images in time is
called a film.
Three basic kinds of information in an image are photometric
information, geometric information, and topological information.
Fundamentally, geometry concerns distance measure. The geometry of
an image is based on coordinate pairs that are associated with the
elements that form the image. From the coordinates such geometric
properties as length, area, angle and moments can be computed.
Photometry means light measure, although physical measurements of
light may include power, hue, saturation, polarization and phase;
only the relative power between points of the same image is easily
available to a computer using a television camera. The taking of
color images is possible at Stanford by means of filters; however,
the acquistion of color is inconvenient and has not been seriously
pursued in the present work. Finally, topology has to do with
neighborhoods, what is next to what; topological data may be
explicitly represented by pointers between related entities, or
implicitly represented by the format of the data.
Three basic kinds of image data structures are the raster,
the contour map and the mosaic. A raster image is a two dimensional
integer valued array of pixels; a pixel "picture element", is a
single sample position on a vidicon. Although the real shape of a
pixel is probably that of a blunt ellipse; the fiction that pixels
tesselate the image into little rectangles will be adopted. For other
theoretical purposes the array is assumed to be formed by sampling
and truncating a two argument, smooth, infinitely differentable real
valued function. A contour image is like a geodesic contour map, no
two contours ever cross and all contours close. A mosaic image (or
tesselation) is like a ceramic tile mosaic, no two regions ever
overlap and the whole image is completely covered with tiles. Further
useful restrictions might be made concerning whether it is permitted
to have tiles with holes surrounding smaller tiles or whether it is
permitted for a tile to have points that are thinner than a single
pixel.
Given a raster image, the classical visual analysis approach
is to find the features. One canonical geometric image feature is
called the <edge> and the places where edges are not found are called
<regions>. For a naive start, an edge can be defined as a locus of
change in the image function. Edges and regions are complementary
sides of the same slippery concept; the concept is slippery because
although a continuous function of two variables and a graph of edges
are well know mathematical objects the conversion of one into the
other is a poorly understood process that depends greatly on ones
motives and resources. A sophisticated definition of the region/edge
notion must include a procedure for converting a raster approximation
of an image function into a region/edge representation based on
parameters which are conceptually elegant.
⊂7.1 CRE - An Image Processing Sub-System.⊃
The acronym CRE stands for "Contour, Region, Edge". CRE is
a solution to the problem of finding contour edges in a sequence of
television pictures and of linking corresponding edges and polygons
from one picture to the next. The process is automatic and is
intended to run without human intervention. Furthermore, the process
is bottom up; there are no inputs that anticipate the content of the
given television images. The output of CRE is a 2-D contour map data
structure which is suitable input to the 3-D modeling program,
GEOMED. Five design choices that determine the character of CRE are:
{λ10;}
1. Dumb vision rather than model driven vision.
2. Multi image analysis rather than single image analysis.
3. Total image structure imposed on edge finding; rather
than separate edge finder and image analyzer.
4. Automatic rather than interactive.
5. Machine language rather than higher level language.
{λ30;}
\The design choices are ordered from the more strategic to
the more tactical; the first three choices being research
strategies, the latter two choices being programming tactics.
Adopting these design choices lead to image contouring and contour
map structures similar to those of [Krakauer] and [Zahn].
The first design choice does not refer to the issue of how
model dependent a finished general vision system will be (it will be
quite model dependent), but rather to the issue of how one should
begin building such a system. The best starting
points are at the two apparent extremes of nearly total knowledge of
a particular visual world or nearly total ignorance. The first
extreme involves synthesis (by computer graphics) of a predicted 2-D
image, followed by comparing the predicted and a perceived image for
slight differences which are expected but not yet measured. The
second extreme involves anaylzing perceived images into structures
which can be readily compared for near equality and measured for
slight differences; followed by the construction of a 3-D geometric
model of the perceived world. The point is that in both cases images
are compared, and in both cases the 3-D model initially (or finally)
contains specific numerical data on the geometry and physics of the
particular world being looked at.
The second design choice, of multi image anaylsis rather than
single image analysis, provides a basis for solving for camera
positions and feature depths. The third design choice solves (or
rather avoids) the problem of integrating an edge finder's results
into an image. By using a very simple edge finder, and by accepting
all the edges found, the image structure is never lost. This design
postpones the problem of interpreting photometric edges as physical
edges. The fourth choice is a resolution to write an image processor
that does not requires operator assistance or parameter tuning. The
final design choice of using machine language was for the sake of
implementing node link data structures that are processed 100 faster
than LEAP, 10 times faster than compiled LISP and that require
significantly less memory than similar structures in either LISP or
LEAP. Furthermore machine code assembles and loads faster than higher
level languages; and machine code can be extensively fixed and
altered without recompiling.
It is my impression that CRE itself does not raise any new
scientific problems; nor does it have any extremely new solutions to
the old problems; rather CRE is another competent video region edge
finding program with its own set of tricks. However, it is further
my impression that the particular tricks for nesting and comparing
polygons in CRE are original programming techniques. As a part of the
larger feedback system, CRE is a necessary, but not entirely
satisfactory implementation of pure bottom up image analysis.
CRE consists of five steps: thresholding, contouring,
nesting, smoothing and comparing. Thresholding, contouring and
smoothing perform conversions between two different kinds of images.
Nesting and contouring compute topological relationships within a
given image representation. In summary the major operations and
operands are:
{JV;λ10;}MAJOR OPERATION. OPERAND. RESULT.
1. THRESHOLDING: 6-BIT-IMAGE, 1-BIT-IMAGES.
2. CONTOURING: 1-BIT-IMAGES, VIC-IMAGE.
3. NESTING: VIC-IMAGE, NESTED-VIC-IMAGE.
4. SMOOTHING: VIC-IMAGE, ARC-IMAGE.
5. COMPARING: IMAGE & FILM, FILM.
{JU;λ30;}
The initial operand is a 6-bit video raster, which in the
present implementation is coerced into a window of 216 row by 288
columns; intermediate operands consist of 1-bit rasters named PAC,
VSEG and HSEG which are explained below, as well as a raster of links
named SKY which is used to perform the polygon nesting. The final
result is a node/link structure composed of several kinds of nodes:
vectors, arcs, polygons, lamtens, levels, images and the film node.
Although the natural order of operations is sequential from
image thresholding to image comparing; in order to keep memory size
down, the first four steps are applied one intensity level at a time
from the darkest cut to the lightest cut (only nesting depends on
this sequential cut order); and comparing is applied to whole images.
Figure 7.1 illustrates an initial video image and its corresponding
contour image. The contuored image consists of thirteen intensity
levels and took 45 seconds to compute (on a PDP-10, 2-usec memory).
The final CRE data structure was composed of 1996 nodes.
⊂7.2 Thresholding.⊃
Thresholding, the first and easiest step of CRE, consists of
two subroutines, called THRESH and PACXOR. THRESH converts a 6-bit
image into a 1-bit image with respect to a given threshold cut level
between zero for black and sixty-three for light. All pixels equal to
or greater than the cut, map into a one; all the pixels less than the
cut, map into zero. The resulting 1-bit image is stored in a bit
array of 216 rows by 288 columns (1728 words, 36 bits per word)
called the PAC (picture accumulator) which was named in memory of
McCormick's ILLIAC-III. After THRESH, the PAC contains blobs of
bits. A blob is defined as "rook's move" simply connected; that is
every bit of a blob can be reached by horizontal or vertical moves
from any other bit without having to cross a zero bit or having to
make a diagonal (bishop's) move. Blobs may of course have holes. Or
equalvalently a blob always has one outer perimeter polygon, and may
have one, several or no inner perimeter polygons. This blob and hole
topology is recoverible from the CRE data structure and is built by
the nesting step.
Next, PACXOR copies the PAC into two slightly larger bit
arrays named HSEG and VSEG. Then the PAC is shifted down one row and
exclusive OR'ed into the HSEG array; and the PAC is shifted right one
column and exclusive OR'ed into the VSEG array to compute the
horizontal and vertical border bits of the PAC blobs. Notice, that
technically this is the very heart of the <edge finder> of CRE;
namely, PACXOR is the mechanism that converts regions into edges;
raster into contours. Further notice, that edge tracing is the only
operation CRE performs on the 1-bit rasters; although Boolean image
processing has caught the eye of many vision programmers (perhaps
because it resembles an array automata or the game Life) one rapidly
discovers that raster operations alone are too weak to do anything
interesting that can't already be done better analytically in a
raster of n-bit numbers or topologically in a node/link data
structure.
⊂7.3 Contouring.⊃
Contouring, converts the bit arrays HSEG and VSEG into
vectors and polygons. The contouring itself, is done by a single
subroutine named MKPGON, make polygon. When MKPGON is called, it
looks for the upper most left non-zero bit in the VSEG array. If the
VSEG array is empty, MKPGON returns a NIL. However, when the bit is
found, MKPGON traces and erases the polygonal outline to which that
bit belongs and returns a polygon node with a ring of vectors. The
MKGON trace can go in four directions: north and south along vertical
columns of bits in the VSEG array, or east and west along horizontal
rows of the HSEG array. The trace starts by heading south until it
hits a turn; when heading south MKPGON must check for first a turn to
the east (indicated by a bit in HSEG); next for no turn (continue
south); and last for a turn to the west. When a turn is encountered
MKPGON creates a vector node representing the run of bits between the
previous turn and the present turn. The trace always ends heading
west bound. The outline so traced can be either the edge of a blob
or a hole, the two cases are distinguished by looking at the
VIC-polygon's uppermost left pixel in the PAC bit array.
There are two complexities: contrast accumulation and
dekinking. The contrast of a vector is defined as (QUOTIENT
(DIFFERENCE (Sum of pixel values on one side of the vector)(Sum of
pixel values on the other side of the vector)) (length of the vector
in pixels)). Since vectors are always either horizontal or vertical
and are construed as being on the cracks between pixels; the
specified summations refer to the pixels immediately to either side
of the vector. Notice that this definition of constrast will always
give a positive constrast for vectors of a blob and negative contrast
for the vectors of a hole.
The contours that have just been traced would appear
"sawtoothed" or "kinky"; the terms "kink", "sawtooth" and "jaggy" are
used to express what seems to be wrong about the lowermost left
polygon in figure 7.2. The problem involves doing something to a
rectilinear quantized set of segments, to make its continuous nature
more evident. In CRE, the jaggies solution (in the subroutine MKPGON)
merely positions the turning locus diagonally off its grid point
alittle in the direction (northeast, northwest, southwest or
southeast) that bisects the turn's right angle. The distance of
dekink vernier positioning is always less than half a pixel; but
greater for brighter cuts and less for the darker cuts; in order to
preserve the nesting of contours. The saw toothed and the dekinked
versions of a polygon have the same number of vectors. I am very
fond of this dekinking algorithm because of its incredible
efficiency; given that you have a north, south, east, west polygon
trace routine (which handles image coordinates packed row, column
into one word); then dekinking requires only one more ADD instruction
execution per vector !
⊂7.4 Polygon Nesting.⊃
The nesting problem is to decide whether one contour polygon
is within another. Although easy in the two polygon case; solving
the nesting of many polygons with respect to each other becomes
n-squared expensive in either compute time or in memory space. The
nesting solution in CRE sacrifices memory for the sake of greater
speed and requires a 31K array, called the SKY. CRE's accumulation of
a properly nested tree of polygons depends on the order of threshold
cutting going from dark to light. For each polygon there are two
nesting steps: first, the polygon is placed in the tree of nested
polygons by the subroutine INTREE; second, the polygon is placed in
the SKY array by the subroutine named INSKY.
The SKY array is 216 rows of 289 columns of 18-bit pointers.
The name "SKY" came about because, the array use to represent the
furthest away regions or background, which in the case of a robot
vehicle is the real sky blue. The sky contains vector pointers; and
would be more efficient on a virtual memory machine that didn't
allocate unused pages of its address space. Whereas most computers
have more memory containers than address space; computer graphics and
vision might be easier to program in a memory with more address space
than physical space; i.e. an almost empty virtual memory.
The first part of the INTREE routine finds the surrounder of
a given polygon by scanning the SKY due east from the uppermost left
pixel of the given polygon. The SON of a polygon is always its
uppermost left vector. After INTREE, the INSKY routine places
pointers to the vertical vectors of the given polygon into the sky
array. The second part of the INTREE routine checks for and fixes up
the case where the new polygon captures a polygon that is already
enclaved. This only happens when two or more levels of the image have
blobs that have holes. The next paragraph explains the arcane
details of fixing up the tree links of multi level hole polygons; and
may be skipped by everyone but those who might wish to implement a
polygon nester.
Let the given polygon be named Poly; and let the surrounder
of Poly be called Exopoly; and assume that Exopoly surrounds several
enclaved polygons called "endo's", which are already in the nested
polygon tree. Also, there are two kinds of temporary lists named the
PLIST and the NLIST. There is one PLIST which is initially a list of
all the ENDO polygons on Exopoly's ENDO ring. Each endo in turn has
an NLIST which is initially empty. The subroutine INTREE re-scans
the sky array for the polygon due east of the uppermost left vector
of each endo polygon on the PLIST, (Exopoly's ENDO ring). On such
re-scanning, (on behalf of say an Endo1), there are four cases:
{λ10;W100,1160;}
\1. No change; the scan returns Exopoly;
which is Endo1's original EXO.
\2. Poly captures Endo1;
the scan returns Poly indicating that endo1 has been captured by Poly.
\3. My brothers fate; the scan hits an endo2 which
is not on the PLIST; which means that endo2's EXO is valid
and is the valid EXO of endo1.
\4. My fate delayed; the scan hits an endo2 which is still
on the PLIST; which means that endo2's EXO is not yet
valid but when discovered it will also be Endo1's EXO;
so Endo1 is CONS'ed into Endo2's NLIST.
{λ30;W0,1260;}
\When an endo polygon's EXO has been re-discovered, then all the
polygons on that endo's NLIST are also placed into the polygon tree
at that place. All of this link crunching machinery takes half a page
of code and is not frequently executed.
⊂7.5 Contour Segmentation.⊃
In CRE the term <segmenting> refers to the problem of
breaking a 1-D manifold (a polygon) into simple functions (arcs). The
segmenting step, converts the polygons of vertical and horizontal
vectors into polygons of arcs. For the present the term "arc" means
"linear arc" which is a line segment. Fancier arcs: circular and
cubic spline were implemented and thrown out mostly because they were
of no use to higher processes such as the polygon compare which would
have to break the fancy arcs back down into linear vectors for
computing areas, inertia tensors or mere display buffers.
Segmenting is applied to each polygon of a level. To start,
a ring of two arcs is formed (a bi-gon) with one arc at the uppermost
left and the other at the lowermost right of the given vector
polygon. Next a recursive make arc operation, MKARC, is is appled to
the two initial arcs. Since the arc given to MKARC is in a one to one
correspondece with a doubly linked list of vectors; MKARC checks to
see whether each point on the list of vectors is close enough to the
approximating arc. MKARC returns the given arc as good enough when
all the sub vectors fall within a given width; otherwise MKARC splits
the arc in two and places a new arc vertex on the vector vertex that
was furthest away from the original arc.
The two large images in figure 7.3, illustrate a polygon
smoothed with arc width tolerances set at two different widths in
order to show one recursion of MKARC. The eight smaller images
illustrate the results of setting the arc width tolerance over a
range of values. Because of the dekinking mentioned earlier the arc
width tolerance can be equal to or less than one pixel and still
yield a substantial reduction in the number of vectors it takes to
describe a contour polygon.
A final important detail is that the arc width tolerance is
actually taken as a function of the highest contrast vector found
along the arc; so that high contrast arcs are smoothed with much
smaller arc width tolerances than are low contrast arcs. After
smoothing, the contrast across each arc is computed and the ring of
arcs replaces the ring of vectors of the given polygon. (Polygons
that would be expressed as only two arcs are deleted).{Q}
⊂7.6 Related and Future Image Analysis.⊃
Image analysis simply consists of three steps: first, high
quality pictures are taken continuously in time and space; second,
several low level bulk operations are applied to each image and to
pairs of images such as correlation, filtering, histogramming and
thresholding; third, the rasters are converted in linked to form 2-D
structures which can be futher linked to form a coherent 3-D
geometric model. In its clear to me that my present implementation
has only afew of the necessary parts; more kinds of image features
and larger coherent structures have to be included. In particular,
the contour maps should be bundled into regional mosaics, moe
features should be woven into the node/link structure such as
parallax edges and high variance windows.
Contour image processing is effectively surveyed by [Freeman]
who gives the impression that contour images are the only
line-drawing representation. Contours are applied to recognition of
silhouettes by [Dudani] using moments similar to those explained in
the next chapter. Finally, my own acquaintance with the contour image
representation was initially derived from papers by [Zahn] and
[Krakaurer].